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ABSTRACT 

The goal of pE-DB (http://pedb.vib.be) is to serve as 
an openly accessible database for the deposition of 
structural ensembles of intrinsically disordered 
proteins (IDPs) and of denatured proteins based on 
nuclear magnetic resonance spectroscopy, small- 
angle X-ray scattering and other data measured in 
solution. Owing to the inherent flexibility of IDPs, 
solution techniques are particularly appropriate for 
characterizing their biophysical properties, and 
structural ensembles in agreement with these data 
provide a convenient tool for describing the 
underlying conformational sampling. Database 
entries consist of (i) primary experimental data 
with descriptions of the acquisition methods and 
algorithms used for the ensemble calculations, and 
(ii) the structural ensembles consistent with these 
data, provided as a set of models in a Protein Data 



Bank format. PE-DB is open for submissions from 
the community, and is intended as a forum for 
disseminating the structural ensembles and the 
methodologies used to generate them. While the 
need to represent the IDP structures is clear, 
methods for determining and evaluating the struc- 
tural ensembles are still evolving. The availability of 
the pE-DB database is expected to promote the 
development of new modeling methods and leads 
to a better understanding of how function arises 
from disordered states. 

INTRODUCTION 

Intrinsically disordered proteins (IDPs) or intrinsically 
disordered regions within otherwise structured proteins 
are defined by the lack of a single static tertiary structure 
under physiological conditions (1-4). These proteins 
have multiple conformations that are separated by low 
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free-energy barriers and consequently their structures 
constantly fluctuate between different states, giving rise 
to a dynamic ensemble of conformations. Disordered 
regions are ubiquitous in proteins involved in biological 
processes of DNA and RNA binding, transcription, trans- 
lation, cell-cycle regulation and membrane fusion, and 
also often in pathologies associated with misfolding and 
aggregation, as observed in a variety of neurodegenerative 
diseases (5) and in the pathogenesis of many other human 
maladies (6). These regions may function as entropic 
chains (such as flexible linkers between folded domains 
or chains that exhibit elastomeric properties) or by tran- 
sient (often modulated by posttranslational modifications) 
or permanent (such as scaffolds or effectors) partner 
binding (1-4). On binding, some IDPs gain a stable 
folded structure (i.e. folding on binding), while others 
retain much flexibility, forming 'fuzzy' complexes (7). 

The existence and functioning of IDPs defy the classical 
structure-function paradigm and pose a serious concep- 
tual challenge to understand how function derives from 
transitions between ensembles of disordered states and 
more limited conformations when bound to their biolo- 
gical targets. Experimentally, the disorder of IDPs has 
been traditionally inferred from residues missing in X- 
ray structures, Kratky plots from small-angle X-ray scat- 
tering (SAXS) measurements, data from nuclear magnetic 
resonance (NMR) experiments and a realm of low-reso- 
lution techniques, such as circular dichroism, fluorescence, 
infrared spectroscopy, etc. (6,8). Structural disorder can 
also be predicted computationally from the primary 
sequence, as disordered regions are enriched in specific 
disorder-promoting amino acids, such as Gly, Pro and 
charged residues, and depleted in order-promoting, 
mostly hydrophobic, amino acids (9,10). One of the 
most pressing and potentially rewarding challenges in 
the IDP field is to improve the experimental and compu- 
tational methods to describe the structural and dynamic 
properties of IDPs and elucidate how their functions are 
mediated by their disordered states, which is anticipated to 
bring the advent of 'unstructural biology' (4). 

Based on NMR and SAXS measurements, structural 
ensembles only started appearing in the literature ~10 
years ago (Table 1). These structural ensembles are still 
often criticized as being models that fit experimental ob- 
servations but lack physical reality. It is difficult to argue 
against this critique because the structural ensembles 
themselves often are not deposited on publication, and 



only conclusions based on their analysis are described. 
Further, the variety of computational approaches 
proposed for the calculation of the structural ensembles 
have never been critically assessed and compared. We 
propose to remedy to this situation by launching pE- 
DB, which provides access to the primary experimental 
data, the algorithms used in their calculation and the co- 
ordinates of the structural ensembles themselves. We en- 
courage the community to deposit structural ensembles of 
novel proteins and even to recalculate ensembles based on 
the primary experimental data. 

pE-DB is complementary to other disorder-related data- 
bases, such as DisProt (16), the database of binary disorder 
classification based on biophysical data, and two sequence- 
based disorder databases, D P 2 (17), which holds disorder 
predictions, and IDEAL (18), which contains manually 
curated annotations of IDP location, structure and func- 
tional sites. pE-DB is most closely related to Biological 
Magnetic Resonance Bank (BMRB) (19), which hosts 
primary NMR data linked to pE-DB, but no other type 
of experimental data or structural ensembles. pE-DB also 
has an interesting relationship with Protein Data Bank 
(PDB) (20), the major structural database that hosts X- 
ray- and NMR-derived structures of folded (ordered) 
proteins, resting on the principle that a protein has a 
single 'real' structure. Last but not least, pE-DB has a re- 
semblance to the Ensemble Protein Database (http://www. 
epdb.pitt.edu/), which, however, holds sets of structures of 
folded proteins generated by computer simulation. 

In the context of these related databases, pE-DB 
provides a forum for the deposition of models of struc- 
tural ensembles of IDPs, which predictably will provide a 
platform for critical evaluation of ensemble calculation 
methods and eventually lead to the development of experi- 
mental and computational standards and protocols that 
will become accepted in the IDP field and beyond. We 
believe creating and publishing the database will stimulate 
the community to submit their data, and we hope to see a 
rapid increase in the entries/ensembles/structures de- 
posited. We are committed to stimulating the field to 
grow and to eventually reach a state of deposition being 
the condition of acceptance of IDP structural work. We 
are convinced that this initiative offers the rich reward of 
bringing the IDP field to maturity through understanding 
the structural underpinning of IDP function in physiology 
and disease, with the ultimate prospect of developing 
novel drugs targeting IDPs involved in disease (21,22). 



Table 1. Examples of recent structural ensembles, their underlying primary experimental data and computational methods developed to calculate 
them 



Protein 


Ensemble calculation 


Constraint(s) 


Reference 


oc-synuclein 


MD 


PREs 


(11) 


DrkN SH3 


ENSEMBLE 


CSs, 15 N R 2 , RDC, PRE, J-couplings, NOEs, 02-derived 


l3 C paramagnetic shifts, Rh, SAXS (12) 


Ntsui Measles 


FM, ASTEROIDS 


RDCs, PREs 


(13) 


p27-KID 


MD 


SAXS, AUC, NMR 


(14) 


pSicl/Cdc4 


ENSEMBLE 


CSs, 15N R2, RDC, PRE, SAXS 


(15) 


complex 








Tau K18 


FM, ASTEROIDS 


RDCs, PREs 


(13) 



This table is not intended to be exhaustive, but only presents ensembles that contributed to the development of the concept and method development. 
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APPROACHING STRUCTURAL ENSEMBLES 

Although a structural description of IDPs is not feasible 
using radiographic crystallography, other techniques, 
such as NMR experiments measuring chemical shifts 
(CSs), residual dipolar couplings (RDCs), 15 N R 2 relax- 
ation rates, paramagnetic relaxation enhancement (PRE) 
distance restraints, J-couplings, pulsed field gradient 
(PFG)-derived hydrodynamic radius (Rh) values, 'H- 15 N 
heteronuclear nuclear overhauser effects and 0 2 (or other 
paramagnetic compound)-derived measures of accessibility 
and SAXS measurements can yield meaningful information 
on the distribution of their shape and size, short- and long- 
range contacts and backbone flexibility (23-25). CSs, the 
first output of any NMR characterization of an IDP, 
provide secondary structural propensities. These can 
nicely be compared with results of predictors and provide 
robust information about the structural and dynamic het- 
erogeneity of a protein. NMR methods are under continu- 
ous development to enable the study of IDPs of increasing 
size and complexity (26). The information derived from 
NMR, combined with that available from SAXS, can be 
used to describe the structure of an IDP as an ensemble of 
conformations (24,25). There are two broad approaches to 
generating disordered state ensembles that fit experimental 
data (27). The first one is to drive molecular dynamics 
(MD) simulations so that a set of structures fit the data, 
called replica-averaged MD (28). The second involves 
the generation of a large number of conformations and 
selection of a subset that best fits the available data. 

In the first approach, MD simulations are carried out to 
sample the conformational space accessible to a given 
protein. As the current force fields, however, do not 
provide exact representations of the interatomic inter- 
actions, the conformational space explored during the 
simulations is often not consistent with the available ex- 
perimental measurements. To overcome this problem, an 
additional term is introduced in the force field that penal- 
izes the deviations between the experimental measure- 
ments and the corresponding values back-calculated 
from the structures sampled during the simulations (11). 
This method is consistent with the maximum entropy prin- 
ciple, and thus provides the minimal modification of the 
force field required to obtain a conformational sampling 
consistent with the experimental data used as restraints 
(28). It is, however, not guaranteed to generate ensembles 
of structures consistent with experimental data not used as 
restraints, a result that would be achieved only when a 
sufficient number of restraints are used (29-32). 

In the second approach, the procedure of ensemble cal- 
culation starts with generating a pool of a vast number of 
conformations. These conformations may be completely 
random or may already be constrained by experimental or 
theoretical data such as ty/<£> angles or secondary structure 
propensities. The programs most commonly used for this 
step are Flexible-Meccano (FM) (33), ensemble optimiza- 
tion method (EOM) (23,24) and TRaDES (34,35). MD 
simulations may also be used to provide a starting pool. 
The conformers generated may need to be completed, for 
example, FM conformers lack side chains that need to be 
modeled in with an algorithm such as SCCOMP (36) or 



SCRWL (37). After generating the starting pool, experi- 
mental data are back-calculated from the conformers to 
enable a direct comparison with actual observations. For 
SAXS data, programs are available, e.g. CRYSOL (38), to 
calculate scattering curves for each individual conformer. 
For NMR data, FM can estimate CSs [using ShiftX, 
SPARTA (39)] or related CS prediction approaches, 
RDCs using local alignment combined with long-range 
effects modulating RDC baselines, or global alignment, 
PREs accounting for local and long-range correlation 
times, SAXS (using CRYSOL) and J-coupling values for 
the generated conformer pools, or ENSEMBLE (40) can 
be used. ENSEMBLE uses CRYSOL for SAXS data, 
HYDROPRO (41) for NMR-derived Rh data, ShiftX 
(42) for CS data, a local-alignment approach (43) for 
RDCs and internal scripts for solvent accessibility, 
PREs, J-couplings, R 2 relaxation rates and nuclear 
overhauser effect (NOE) values. 

The aim of the ensemble calculation is to select a subset 
of conformers whose back-calculated values fit the actual 
experimental data coming from SAXS and NMR meas- 
urements. The software Gajoe, part of EOM, deals with 
the selection of the pool of conformers that fit the theor- 
etical and experimental SAXS curves best. The program 
ASTEROIDS (25,44) starts from the statistical coil model 
derived from FM, and selects ensembles, iteratively 
repopulating underlying potential energy landscapes and 
recalculating all experimental data from each newly 
calculated ensembles. The approach uses a genetic algo- 
rithm to converge to ensembles whose elements are differ- 
ent in each ensemble, but that are in equal agreement with 
the experimental data, within the level of the experimental 
noise. The approach makes extensive use of cross-valid- 
ation of data that are not used in the selection procedure 
to generally test the predictive nature of the approach and 
to guard against over-fitting. ENSEMBLE (40) similarly 
can select a subset of conformers on the basis of SAXS 
and a variety of different NMR data. The size of the final 
ensembles may range from only a few to hundreds of con- 
formers. We note that while it is tempting to interpret each 
member of the ensemble as an existing conformational 
substate, it is important to remember that ensemble de- 
scriptions can only be considered as discrete representa- 
tions of highly complex probability distribution functions. 

The challenge that we face when calculating structural 
ensembles is to demonstrate that they provide an accurate 
representation of the range of conformations explored by 
proteins during their thermal fluctuations. It must be 
acknowledged that the ensemble description of IDPs has 
not yet reached the rigor of other protein structure discip- 
lines, and thus has to be treated with care, although we 
must not forget either that PDB structures are also models 
describing experimental observations. First, the quality of 
the final ensemble depends strongly on the quality of the 
experimental data. Aggregation, degradation or sample 
purity issues can severely affect the reliability of measure- 
ments and hence of the corresponding ensembles. In case 
of techniques such as SAXS, experiments always yield in- 
terpretable results, i.e. data has to be carefully examined 
and controlled. Although the predictive nature of the dif- 
ferent ensemble approaches can in principle be tested 
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against data that are not included in the selection, this is 
rarely done in practice. Furthermore, if insufficient data 
are used (as is invariably the case), there could be multiple 
structural ensembles that are equally consistent with them, 
hence preventing an unambiguous answer to the problem 
of determining the correct structural ensemble. In 
addition, given the large number of degrees of freedom 
and the astronomical number of potential structures and 
IDP visits, multiple different ensembles can be always 
computed describing the experimental data with the 
same level of agreement, which will happen in all circum- 
stances, as it is inherent to ensembles of disordered 
proteins. Despite these ambiguities, due to constraints 
coming from SAXS and NMR, the ensembles have to 
show similarities in hydrodynamic behavior and also in 
local structural preferences. The level of similarity, 
however, has to be established, and the purpose of pE- 
DB is to help resolve these issues and drive the develop- 
ment of robust methodologies and concepts for deriving 
physically realistic structural ensembles. 

DATABASE STRUCTURE AND CONTENT 

pE-DB is implemented as a relational MySQL database 
that consists of a core set of generic tables storing meta- 
information and dedicated modules for NMR and SAXS 
experimental parameters (Figure 1). The core tables 
record information on the proteins used in the experi- 
ments (e.g. sequence, molecular weight, mutations, 
posttranslational modifications, etc.), cross-links to 
relevant databases, such as UniProt (45), Ensembl (46), 



BMRB (19) or DisProt (16), the organisms and expression 
systems used and meta-information regarding the authors 
and — if applicable — related publications. The SAXS and 
NMR modules consist of multiple tables recording the 
complete description of the experiments. 

Database entries have unique four-letter identifiers that 
are the primary keys used to link related tables to the core 
table. These identifiers connect the meta-information 
recorded in the database and the actual data files stored 
on the pE-DB file server. Three types of data files are 
stored locally: NMR-related values, i.e. lists of CSs, 
RDCs, PREs or J-couplings, scattering curves from 
SAXS measurements and sets of structural ensemble files 
in PDB format. Ensembles consist of a few dozen to 
hundreds (and possibly even more) of conformers and 
each entry may have more than one ensemble associated 
to it, since multiple ensembles may fit the experimental 
data equally well. 

The database is open to submissions from the commu- 
nity and researchers are encouraged to submit their data 
to pE-DB using the online submission interface. Data sub- 
mission is initiated by filling out a pre-submission form 
describing briefly the experiments and providing related 
publications, if applicable. Data can also be submitted 
before publication; in such cases, the entry will be 
released only after the date specified by the authors. Pre- 
submission forms are processed and if found suitable, the 
pE-DB crew contacts the submitters requesting additional 
data and information. Submitters are required to provide 
meta-information by filling out an online submission 
form, followed by uploading their experimental data and 





• Sample conditions 




• Experimental parameters 




• NMR device information 


NMR module 


• Chemical shifts 




• PREs 




• RDCs 




• J-couplings 



4 



Meta-information 



Ensemble parameters 



pE-DB core table 



t 





• Sample conditions 




• Experimental parameters 


SAXS module 


• Beamline information 




• Scattering intensities 




• Output values (e.g. Rg, Dmax, lo) 




Figure 1. Structure of the pE-DB database. The relational data model of pE-DB consists of a set of tables organizing modules, all connected to the 
main table recording the four-letter unique pE-DB identifier. Supported data types have dedicated table sets, storing relevant information to provide 
full description of the structural ensembles, the calculation procedures and the underlying experimental data. The complete data scheme is available 
online under the 'Documentation' section. 
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structural ensembles via secure FTP connection. 
Submissions are manually curated by experts in the field 
and only ensembles based on high-quality experimental 
data are considered for deposition. 



USER INTERFACE AND WEB SITE FEATURES 

Searching and browsing 

The online user interface of pE-DB provides support for 
accessing data in multiple ways from browsing and quick 
searches to bulk downloads, complex queries and SQL 
commands. 

pE-DB can be browsed according to different criteria, 
such as accession identifier, protein name and data type. 
Selecting any of these options leads to a list of pE-DB 
entries with relevant information depending on the 
selected browsing option. The number of entries per 



page can be specified using the scroll window next to the 
'Browse by' label and pressing the 'Go' button. 

Searching the database can be done by typing the query 
string at the 'Search' section at the top of the window. By 
default, this will search entries with any type of data and 
in every string category. Optionally, the type of the string 
can be specified with the scroll-down button next to the 
text field. The type of experimental data type can also be 
specified using the bottom menu of the section under 
the text field. Using the advanced search interface, an 
arbitrary number of query strings can be used. Again, 
the type of the string and the experimental data type can 
be specified, and users need to specify the Boolean 
operator (AND/OR). Both search methods return a list 
of matching entries with brief descriptions and direct links 
to download data (Figure 2). 

Advanced users may perform complex searches by using 
an online SQL terminal. The data scheme of the database 



The following 2 entries have been returned for your query: 

Select all U 

Download selected 



■ Ensemble description of K18 domain of Tau protein using NMR 

techniques 

Accesion ID Correspondent Release date SAXS data NMR data I 



6AAC Martin Blackledge 2013-06-10 No Yes 



© Download complete entry (compressed) 
® Download structure archive (.pdb) 
© Download sequences (.fasta) 
© Download experimental data 




Authors: Markus Zweckstetter; Martin Blackledge; Valery Ozenne; Keywords: asteroids; flexible-meccano; 

Robert Schneider; Mingxi Yao; Jie-rong Huang; Loic Salmon; Malene intrinsically disordered; NMR; single residue 
Ringkjobing Jensen; resolution; 



■ 


Ensemble of the free form N-TAIL Measles nucleoprotein 


Accesion ID 


Correspondent 


Release 
date 


SAXS 
data 


NMR ^^^^^^^f^^^^^^^^^^^H 

data 


7AAC 


Martin Blackledge 


2013-06-13 


No 


Yes 




© Download complete entry (compressed) 
© Download structure archive (.pdb) 
© Download sequences (.fasta) 
© Download experimental data 


Authors: Markus Zweckstetter; Martin Blackledge; Valery Ozenne; Robert 
Schneider; Mingxi Yao; Jie-rong Huang; Loic Salmon; Malene 
Ringkjobing Jensen; 


Keywords: intrinsically disordered; NMR; 



Figure 2. Search results in pE-DB. The basic search field or the advanced search option gets the user the 'Search results' screen. Here, entries 
corresponding to the search query are listed, displaying the title of the accessions, the pE-DB identifiers, authors and the underlying data types of the 
ensembles. A sample screenshot of one conformer from an ensemble is shown on the right side. Direct download links to the sequences, experimental 
data, structural ensembles and the complete archives can be found on the left side. 
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required for formulating selection queries can be found 
under the 'Documentation' section of the Web site. 
Users may only carry out 'SELECT' type commands. 

Data retrieval 

The key for data retrieval from pE-DB is the unique iden- 
tifier of each accession. In case of single entry downloads, 
users may navigate to the accession screen using any of the 
methods detailed above and select from various download 
options, i.e. downloading the complete data archive, only 
specific data types, sequences or structural ensembles. 

Bulk downloading can be done by navigating to the 
'Download pE-DB' section on the Web site. Here, the 
complete pE-DB can be downloaded as flat SQL file or 
tab-separated. sv file. NMR, SAXS and structural data 
along with nonredundant sequences (in FASTA format) 
may also be retrieved. By providing a list of pE-DB iden- 
tifiers, users may download sets of sequences, experimen- 
tal data, structural archives as well as complete entries. 



ACCESSION SCREEN AND JMOL APPLET 

The accession screen displays the available meta-information 
for a specific entry and provides direct download links to the 
experimental data and the structural ensembles (Figure 3). 
By default, only the 'General information' section is 
expanded, users may view other sections by pressing the 
'Show/Hide' button found at the top right of each section. 

The general information section displays the authors, a 
brief description of the entry and the data types used as 
constraints for the ensemble calculations. Below this 
section is a preview gallery of some of the conformers 
found in the ensembles. The left figure shows the most 



compact conformer, the middle figure shows a conformer 
close to the average R g of the ensembles, while the right 
figure displays the most extended conformer. Clicking on 
any of these figures leads to a new window where users 
may find each ensemble and each conformer with its cor- 
responding radius of gyration (R a ) and D max values. Each 
conformer can be visualized using a built-in customizable 
Jmol applet (Figure 4) (47,48). 

The SAXS and NMR sections display experimental par- 
ameters and settings, as well as links to download the data 
archives, and in the case of SAXS data to visualize the 
scattering data with normalized Kratky plots, P(r) 
distance distribution plots, Guinier-plots and the scattering 
curve itself. In the case of NMR data, since CSs are the 
primary requirement of any NMR investigation, and thus 
always available, these are used to produce secondary 
structural propensity plots that indicate the propensity of 
different parts of the polypeptide chain to adopt secondary 
structural conformations. These are easy to inspect, rich 
of information on the structural and dynamic properties 
of a protein and can be compared with results of predictors, 
all features that are going to stimulate further progress. 
If applicable, a link to the corresponding BMRB entry is 
provided. 

At the bottom of each accession is a dedicated discus- 
sion section, where registered users are encouraged to 
share their thoughts on the entry, the techniques used 
and the underlying data. Registration is fast, free and 
requires only a valid e-mail address. 



AVAILABILITY 

The database is freely available at http://pedb.vib.be. We 
encourage users to register free accounts, to be able to 




Back to entry 

Select a conformer from the list below to display it 

Running Jmol needs an installed JAVA environment - Click here to Ret it 
Minimum recommended resolution: 1024 X 768 



Rg distribution of Ensemble #1 




Ensemble 
number 



Ensemble 
size 



Average Rg Average Dmax Show/Hide 



16 38.1925-2.74116 119.494+-7.90666 

conformers A A 



Show 
List 



Figure 3. Jmol applet and list of conformers. Entries in pE-DB may have multiple ensembles, which may fit equally well the underlying experimental 
data. By navigating to the Jmol applet screen, the user can view the R g distribution of each ensemble, the number of conformers and the aver- 
age values for the R g and the maximal distance (D max ). By clicking on the 'Show/Hide' button, a list of the conformers appears, featuring R g 
and D mills values and a Jmol button. Clicking on the Jmol button, every single conformer can be selected to be visualized by a fully customizable 
Jmol applet. 
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Dynamic complex of the intrinsically disordered phosphorylated 
Sicl with the Cdc4 subunit of an SCF ubiquitin ligase - 5AAC 

General information Display structures with Jmol 

Gallery 
Proteins 
Ug an d a 

SAXS experiments 
NMR experiments 
Softwares 
Authors 
References 
Discussion 




Quick download links 

Download experimental data Download complete entry 
Download structures (.pdb) Download sequences (fasta) 



General information - Return to the Top 



Authors: Tanja Mittag: Joseph A. Marsh: Alexander Grishaev: Stephen Orlicky: Hong Lin: Frank 
Sicheri: Mike Tyers: Julie D. Forman-Kay: 

Ensemble size: 
SAXS data available: Yes 
NMR data available: Yes 
Release date: 2013-05-27 
Last modified: 2013-05-29 

Abstract: Intrinsically disordered proteins can form highly dynamic complexes with partner proteins. 
One such dynamic complex involves the intrinsically disordered Sicl with its partner Cdc4 in 
regulation of yeast cell cycle progression. Phosphorylation of six N-terminal Sicl sites leads to 
equilibrium engagement of each phosphorylation site with the primary binding pocket in Cdc4. the 
substrate recognition subunit of a ubiquitin ligase. ENSEMBLE calculations using experimental 
nuclear magnetic resonance and small-angle X-ray scattering data reveal significant transient 
structure in both phosphorylation states of the isolated ensembles (Sicl and pSicl) that modulates 
their electrostatic potential, suggesting a structural basis for the proposed strong contribution of 
electrostatics to binding. A structural model of the dynamic pSicl-Cdc4 complex demonstrates the 
spatial arrangements in the ubiquitin ligase complex. These results provide a physical picture of a 
protein that is predominantly disordered in both its free and bound states, enabling aspects of its 
structure/function relationship to be elucidated. 



Image gallery - Return to the Top 



Click on any of the figures to view every conformer with Jmol 




Conformer with the lowest Rg Conformer with average Rg Conformer with the highest Rg 



Protein information - Return to the Top 



SAXS information - Return to the Top - Download SAXS data 



NMR information - Return to the Top - Download NMR data 



Software information - Return to the Top 



Author information - Return to the Top 



Reference information - Return to the Top 




Discussion - Please 



ogin Register 



Figure 4. pE-DB entry screen. pE-DB entries display all the available meta-information for each accession, direct download links to various data 
types, sequences and structural ensembles, and a selection of figures and plot to visualize the data. The top field includes a table of contents on the 
left, with clickable links to the different sections and a sample figure that is a link to the Jmol applet used to visualize each conformation in the 
ensemble. The general information section contains a brief description of the entry and the list of the authors. The image gallery shows three 
conformers from the ensembles, one with the lowest radius of gyration (R g ) value, one with an R g value closest to the ensemble average and a 
conformer with the highest R g . These figures are clickable links leading to the Jmol screen. Below the gallery, different sections can be found, which 
are hidden by default, but can be opened by pressing the 'Show/Hide' buttons. At the bottom of each entry is a dedicated discussion section, where 
users may comment on the entry, sharing their thoughts on the ensembles, the underlying data or the calculation method. 
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engage in discussions about the ensembles and their underly- 
ing calculation techniques at the discussion section of each 
entry. However, every other functionality of the database 
from complex queries to SQL command support and bulk 
download is accessible without the need for registration. 

CONCLUSIONS AND OUTLOOK 

We believe that the establishment of pE-DB represents a 
cornerstone in the evolution of the IDP field, opening the 
way to assessing and perfecting methodologies for the 
structural descriptions of the disordered state, a goal 
which is critical for developing quantitative structure- 
function models of IDPs (4,27). In this new era of struc- 
tural biology, description of biomolecules as single static 
structures is increasingly recognized as being inadequate 
for understanding function. Rather, proteins must be 
described as ensembles of thermally accessible conformers. 
Since the pE-DB database represents a radical break with 
our traditional ways of looking at protein structures, it 
may also provoke novel modes of structure visualization 
addressing the multiplicity and dynamics of structures, 
such as by videos or continuum spatial functions. It 
follows form these notions that pE-DB will be comple- 
mentary to more traditional databases, such as BMRB 
(19), which is mandated to host NMR-measurable infor- 
mation but not structural ensembles, and PDB (20), which 
is mandated to handle experiment-only well-defined struc- 
ture coordinates. Neither PDB nor BMRB have the 
mandate or the capacity to handle the type of information 
contained in pE-DB, in which ensembles could be 
generated from NMR, SAXS, single-molecule fluores- 
cence and other non-NMR techniques, integrated in 
model coordinate data rather than well-defined structure 
coordinates, which are not accepted by PDB either. 

One has to be aware, of course, that the ensembles are 
not precise or complete representations of disordered states 
but rather models that fit a specifically defined subset of 
data, and unique solutions cannot be expected owing to the 
extreme conformational freedom of IDPs and the limited 
data (12,49). The more data we can incorporate into model 
building, however, the more realistic the ensemble will be, 
and the major ambition of pE-DB is to help stimulate and 
guide this process. It is important to have data of different 



types used for best results (i.e. at least some data on local or 
secondary structural propensities such as CSs, some data 
on global hydrodynamic properties such as SAXS or 
NMR PFG-derived Rh and some data on specific tertiary 
contacts such as PRE, etc.) because ensembles calculated 
with data from only a certain class will have limitations (i.e. 
a SAXS-refined ensemble will not provide information 
about the secondary structural elements, also encountered 
in PRE-refined ensembles. Conversely, ensembles with 
residue-specific information (CS and RDCs) will not 
properly describe a PRE profiles or a SAXS curve). 
Therefore, to help avoid overinterpretation, it is important 
to define (1) which data types and (2) how many restraints 
of each data type are used to calculate each of the ensem- 
bles. To complicate things, however, one also has to be 
careful to write that a particular restraint reports only on 
one aspect of the conformational behavior. For example, 
paramagnetic measurements are mainly used to describe 
transient long-range contacts, but the information they 
also provide concerning chain rigidity is usually over- 
looked because it is a more subtle, weaker dependence 
and maybe a less interesting aspect. The inverse is true 
for RDCs, where the more transient structure present in 
the ensemble, the more long-range order will affect the 
measured RDCs. CSs have also been used to report on 
transient long-range contacts, provided they are 
measured precisely enough. These different aspects of ex- 
periment parameters are outlined in Table 2. 

The present size of pE-DB is comparable with the initial 
size of PDB (then Brookhaven Data Bank), which started 
with seven structures in 1971 (20). Considering the import- 
ance of structural disorder, there can be no doubt that 
it will rapidly grow in size. To this end, we encourage 
researchers to submit their ensembles and the correspond- 
ing primary experimental data. We will also consider 
including additional types of data, such as fluorescence 
resonance energy transfer (FRET) data, which might 
rapidly gain importance in determining dynamic struc- 
tures (50). The database already holds unfolded ensem- 
ble^) of globular proteins (29), which may lead to a 
better understanding of protein folding, and also address 
the question as to whether IDPs are fundamentally differ- 
ent from denatured states of folded proteins [cf. the term 



Table 2. The type of structural information obtained from the different types of experimental parameters used to calculate pE-DB ensembles 



Experimental parameter Major conformational information for IDPs 



NMR CSs 


Local structural propensities (poly-proline II, a-helix and p-strand populations) 


NMR PREs 


Detection of distances between regions distant in primary sequence (one containing a spin-label) 


NMR RDCs 


Local structural propensities 
Cooperativity of secondary structures 
Transient long-range interactions 


NMR spin relaxation ( 15 N, 13 C) 


Differential rigidity 

Local dynamic timescales and amplitudes 


NMR relaxation dispersion 


Characterization of weakly populated states using CSs/RDCs (see above). 
Conformational exchange on micro-millisecond timescales (folding/binding) 


Small angle scattering (SAXS/SANS) 


Pairwise distribution function of long-range distances 


FRET 


Long-range interactions 
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'natively denatured proteins' for IDPs (51)]. Furthermore, 
with the development of methods that are able to probe 
molecular motions on the timescale of ps to ns or beyond, 
the deposition of structural ensembles might be of direct 
relevance for structured proteins that populate multiple 
conformational substates in the course of fulfilling their 
biological functions, as in allostery or enzyme catalysis, 
for example (15). 

In conjunction with these goals, the database will help 
establish the quality, reliability and descriptive power of 
structural ensembles. Current ensembles are often 
criticized but never critically evaluated, and the ready 
availability of supporting data in pE-DB will now enable 
development of standard methods for analysis and quality 
control. Three types of analyses can be anticipated. It is 
straightforward to analyze the structural features of en- 
sembles, such as distribution of secondary structure or 
hydrodynamic parameters. More demanding will be to 
establish whether ensembles are realistic in terms of the 
distribution of conformational energies and agreement 
with the primary restraint data. Last but not least, there 
is a fundamental need to understand the connection 
between structural ensembles and protein function. 
Often, arguments about the function of an IDP are 
elaborated on the basis of knowledge of the target- 
bound, folded state, with total neglect of the dynamics 
and structural distribution of the unbound state. 

To stimulate further development of the field, we also 
encourage users to recalculate ensembles, deposit them in 
the database and assess the quality of different versions. 
These efforts will all contribute to development and 
acceptance of standardized protocols for quality control, 
for eventual incorporation into the pE-DB data deposition 
pipeline. In the medium- or long-term, we even anticipate 
that a competition analogous to the Critical Assessment of 
Structure Prediction (52) could be implemented for de novo 
calculation of structural ensembles of IDPs. The real tran- 
sition in the life of the database will come when demands 
from the community for data deposition as a requirement 
of publication will be raised; in the digital world, it certainly 
will not take 18 years as in the case of the PDB (20). Either 
way, if we accomplish all these goals, this novel structural 
resource will help to extend the structure-function 
paradigm to include the disordered state of proteins (4) 
and will aid the development of therapeutics for debilitating 
diseases such as cancer and neurodegeneration (21,22). 
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